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Abstract — Much research has been conducted to securely 
outsource multiple parties' data aggregation to an untrusted 
aggregator without disclosing each individual's privately owned 
data, or to enable multiple parties to jointly aggregate their 
data while preserving privacy. However, those works either 
require secure pair-wise communication channels or suffer from 
high complexity. In this paper, we consider how an external 
aggregator or multiple parties can learn some algebraic statistics 
(e.g., sum, product) over participants' privately owned data 
while preserving the data privacy. We assume all channels are 
subject to eavesdropping attacks, and all the communications 
throughout the aggregation are open to others. We propose 
several protocols that successfully guarantee data privacy under 
this weak assumption while limiting both the communication and 
computation complexity of each participant to a small constant. 

Index Terms — Privacy, aggregation, secure channels, SMC, 
homomorphic. 

I. Introduction 

The Privacy-preserving data aggregation problem has long 
been a hot research issue in the field of applied cryptography. 
In numerous real life applications such as crowd sourcing or 
mobile cloud computing, individuals need to provide their sen- 
sitive data (location-related or personal-information-related) to 
receive specific services from the entire system (e.g., location 
based services or mobile based social networking services). 
There are usually two different models in this problem: 1) 
an external aggregator collects the data and wants to conduct 
an aggregation function on participants' data (e.g., crowd 
sourcing); 2) participants themselves are willing to jointly 
compute a specific aggregation function whose input data is 
co-provided by themselves (e.g., social networking services). 
However, the individual's data should be kept secret, and the 
aggregator or other participants are not supposed to learn any 
useful information about it. Secure Multi-party Computation 
(SMC), Homomorphic Encryption (HE) and other crypto- 
graphic methodologies can be partially or fully exploited to 
solve this problem, but they are subject to some restrictions 
in this problem. 

Secure Multi-party Computation (SMC) was first formally 
introduced by Yao |22| in 1982 as Secure Two-Party Compu- 
tation. Generally, it enables n parties who want to jointly and 
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privately compute a function 

f(xi,X 2 ,--- ,X n ) = {yi,V2,--- ,Vn} 

where x,i is the input of the participant i, and the result yi is 
returned to the participant i only. Each result can be relevant 
to all input a^'s, and each participant i knows nothing but his 
own result y;. One could let the function in SMC output only 
one uniform result to all or parts of participants, which is the 
algebraic aggregation of their input data. Then the privacy - 
preserving data aggregation problem seems to be solved by 
this approach. However this actually does not completely solve 
our problem because interactive invocation is required for 
participants in synchronous SMC (e.g., lTT3l ). which leads to 
high communication and computation complexity, which will 
be compared in the Section IVIIII Even in the asynchronous 
SMC, the computation complexity is still too high for practical 
applications. 

Homomorphic Encryption (HE) allows direct addition and 
multiplication of ciphertexts while preserving decryptability. 
That is, Enc(mi) <£> Enc(m2) = Enc(mi x TO2), where 
Enc(m) stands for the ciphertext of m, and ®, x refer to 
the homomorphic operations on the ciphertext and plaintexts 
respectively. One could also try to solve our problem using 
this technique, but HE uses the same decryption key for 
original data and the aggregated data. That is, the operator 
who executes homomorphic operations upon the ciphertexts 
are not authorized to achieve the final result. This forbids 
aggregator from decrypting the aggregated result, because if 
the aggregator is allowed to decrypt the final result, he can also 
decrypt the individual ciphertext received, which contradicts 
our motivation. Also, because the size of the plaintext space is 
limited, the number of addition and multiplication operations 
executed upon ciphertexts was limited until Gentry et al. 
proposed a fully homomorphic encryption scheme [11] and 
implemented it in IT21 . However, Lauter et al. pointed out 
in lfl6l that the complexity of general HE is too high to use 
in real application. Lauter also proposed a HE scheme which 
sacrificed possible number of multiplications for speed, but it 
still needs too much time to execute homomorphic operations 
on ciphertexts. 

Besides the aforementioned drawbacks, both SMC and 
HE require an initialization phase during which participants 
request keys from key issuers via secure channel. This could be 
a security hole since the security of those schemes relies on the 
assumption that keys are disclosed to authorized participants 
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Secure Multi-party Computation 



Pros 



Cons 



different outputs for different participants 



high complexity due to the computation based on garbled circuit 
frequent interactions required for synchronous SMC 



Homomorphic Encryption 



Pros 



Cons 



efficient if # of multiplcations is restricted 



decrypter can decrypt both aggregated data and individual data 
trade-off between # of multiplications and complexity exists 



only. In this paper, we revisit the classic privacy preserving 
data aggregation problem. Our goal is to design efficient 
protocols without relying on a trusted authority and secure 
pair-wise communication channels. The main contributions of 
this paper are: 

• Formulation of a model without secure channel: Different 
from many other models in privacy-preserving data ag- 
gregation problem, our model does not require a secure 
communication channel throughout the protocol. 

• Efficient protocol in linear time: The total communication 
and computation complexity of our work is proportional 
to the number of participants n, while the complexities 
of many similar works are proportional to n 2 . We do not 
use complicated encryption protocols, which makes our 
system much faster than other proposed systems. 

• General Multivariate Polynomial Evaluation: We general- 
ize the privacy-preserving data aggregation to secure mul- 
tivariate polynomial evaluation whose inputs are jointly 
provided by multiple parties. That is, our scheme enables 
multiple parties to securely compute 
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where the data Xi is a privately known data by user i. 

Note that our general format of data aggregation can be 
directly used to express various statistical values. For example, 
Y17—i x i can easily be achieved while preserving privacy, and 
thus the mean /i = YH=i x i/ n can be computed with privacy- 
preserving. Given the mean p, n/i 2 + Y^i=i ( x i ~ 2^;m) can 
be achieved from the polynomial, and this divided by 77 is 
the population variance. Similarly, other statistical values are 
also achievable (e.g., sample skewness,k-th moment, mean 
square weighted deviation, regression, and randomness test) 
based on our general multi-variate polynomial. Although our 
methods are proposed for computing the value of a multi- 
variate polynomial function where the input of each participant 
is assumed to be an integer, our methods can be generalized 
for functions (such as dot product) where the input of each 
participant is a vector. 

The rest of the paper is organized as follows. We present 
the system model and necessary background in Section [HI] In 
Section llVl we analyze the needed number of communications 
with secure communication channels when users communicate 
randomly. We first address the privacy preserving summation 
and production in Section [V] by presenting two efficient pro- 
tocols. Based on these protocols, we then present an efficient 
protocol for general multi-variate polynomial evaluation in 



Section |Vl] In Section I VIII we present detailed analysis of 
the correctness, complexity, and security of our protocols. 
Performance evaluation of our protocols is reported in Section 
I VIIH We compare our protocol with the ones based on SMC 
or HE. We then conclude the paper with the discussion of 
some future work in Section |IX] 

II. Related Work 

Many novel protocols have been proposed for privacy- 
preserving data aggregation or in general secure multi-party 
computation. Castelluccia et al. J5) presented a provable secure 
and efficient aggregation of encrypted data in WSN, which is 
extended from |6j. They designed a symmetric key homomor- 
phic encryption scheme which is additively homomorphic to 
conduct the aggregation operations on the ciphertexts. Their 
scheme uses modular addition, so the scheme is good for CPU- 
bounded devices such as sensor nodes in WSN. Their scheme 
can also efficiently compute various statistical values such as 
mean, variance and deviation. However, since they used the 
symmetric homomorphic encryption, their aggregator could 
decrypt each individual sensor's data, and they assumed the 
trusted aggregator in their model. 

Sheikh et al. |fl9l| proposed a fc-secure sum protocol, which 
is motivated by the work of Clifton et al. (7). They sig- 
nificantly reduced the probability of data leakage in Q by 
segmenting the data block of individual party, and distributing 
segments to other parties. Here, sum of each party's segments 
is his data, therefore the final sum of all segments are sum 
of all parties' data. This scheme can be easily converted to 
fc-secure product protocol by converting each addition to mul- 
tiplication. Similar to our protocol, one can combine their sum 
protocol and converted product protocol to achieve a privacy- 
preserving multivariate polynomial evaluation protocol. How- 
ever, pair-wise unique secure communication channels should 
be given between each pair of users such that only the receiver 
and the sender know the transmitted segment. Otherwise, each 
party's secret data can be calculated by performing O(k) 
computations. In this paper, we remove the limitation of using 
secure communication channels. 

The work of He et al. fl4l is similar to Sheikh et a/.'s 
work. They proposed two privacy -preserving data aggregation 
schemes for wireless sensor networks: the Cluster-Based Pri- 
vate Data Aggregation (CPDA) and the Slice-Mix-AggRegaTe 
(SMART). In CPDA, sensor nodes form clusters randomly and 
collectively compute the aggregate result within each cluster. 
In the improved SMART, each node segments its data into n 
slices and distributes n — 1 slices to nearest nodes via secure 
channel. However, they only supports additions, and since each 
data is segmented, communication overhead per node is linear 
to the number of slices n. 

Shi et al. 112011 proposed a construction that n participants 
periodically upload encrypted values to an aggregator, and the 
aggregator computes the sum of those values without learning 
anything else. This scheme is close to our solution to the 
multivariate polynomial evaluation problem, but they assumed 
a trusted key dealer in their model. The key dealer distributes 
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random key fc, to participant i and key fco to the aggregator, 
where Hf_ ki — 1, and the ciphertext is in the format of 
Ci = ki ■ g Xi . Here, g is a generator, fcj is a participant's key 
and Xi is his data (for i = 1, 2, • • -n). Then, the aggregator can 
recover the sum ^ZILi Xi ^ ^ e received ciphertexts from all of 
the participants. He computes koTV^Ci to get g^- i = lX \ and 
uses brute-force search to find the Yn=i Xi or uses Pollard's 
lambda method ffT~8l to calculate it. This kind of brute-force 
decryption limits the space of plaintext due to the hardness 
of the discrete logarithm problem, otherwise no deterministic 
algorithm can decrypt their ciphertext in polynomial time. The 
security of their scheme relies on the security of keys fc,. 

In our scheme, the trusted aggregator in (]5] is removed 
since data privacy against the aggregator is also a top concern 
these days. Unlike H141 11191 , we assumed insecure channels, 
which enabled us to get rid of expensive and vulnerable key 
pre-distribution. We did not segment each individual's data, 
our protocols only incur constant communication overhead for 
each participant. Our scheme is also based on the hardness of 
the discrete logarithm problem like lEDI . but we do not trivially 
employ brute-force manner in decryption, instead, we employ 
our novel efficient protocols for sum and product calculation. 

III. System Models and Preliminary 

A. System Model and Problem Definition 

Assume that there are n participants {p 1; p 2 , •• • ,P„}, and 
each participant p ; has a privately known data Xi from a 
group Gi. The privacy -preserving data aggregation problem 
(or secure multivariate polynomial evaluation problem) is to 
compute some multivariate polynomial of xi jointly or by 
an aggregator while preserving the data privacy. Assume that 
there is a group of m powers {d^k E Z g | k = 1, 2, • • • , m} 
for each p i and m coefficients {c^ | k = 1, • • • ,m,Ck £ Gi}. 
The objective of the aggregator or the participants is to com- 
pute the following polynomial without knowing any individual 
x i . 

rn n 

fe=i i=i 

Here vector x = (xi,X2, • • • ,x n ). For simplicity, we assume 
that the final result /(x) is positive and bounded from above 
by a large prime number P. We assume all of the powers 
disk's and coefficients Cfc's are open to any participant as well 
as the attackers. This is a natural assumption since the powers 
and coefficients uniquely determine a multivariate polynomial, 
and the polynomial is supposed to be public. 

We employ two different models in this paper: One Aggre- 
gator Model and Participants Only Model. These two models 
are general cases we are faced with in real applications. 

One Aggregator Model: In the first model, we have one 
aggregator A who wants to compute the function /(x). We 
assume the aggregator is untrustful and curious. That is, he 
always eavesdrops the communications between participants 
and tries to harvest their input data. We also assume partici- 
pants do not trust each other and that they are curious as well, 



however, they will follow the protocol in general. We could 
also consider having multiple aggregators, but this is a simple 
extension which can be trivially achieved from our first model. 
We call this model the One Aggregator Model. Note that in 
this model, any single participant p^ is not allowed to compute 
the final result /(x). 

Participants Only Model: The second model is similar to 
the first one except that there are n participants only and there 
is no aggregator. In this model, all the participants are equal 
and they all will calculate the final aggregation result /(x). 
We call this model the Participants Only Model. 

In both models, participants are assumed not to collude with 
each other. Relaxing this assumption is one of our future work. 

B. Additional Assumptions 

We assume that all the communication channels in our 
protocol are insecure. Anyone can eavesdrop them to inter- 
cept the data being transferred. To address the challenges of 
insecure communication channel, we assume that the discrete 
logarithm problem is computationally hard if: 1) the orders of 
the integer groups are large prime numbers; 2) the involved 
integer numbers are large numbers. The security of our scheme 
relies on this assumption. We further assume that there is 
a secure pseudorandom function (PRF) which can choose 
a random element from a group such that this element is 
computationally indistinguishable to uniform random. 

We also assume that user authentication was in place to au- 
thenticate each participants if needed. We note that Dong et al. 
J9] investigated verifiable privacy-preserving dot production of 
two vectors and Zhang et al. 11241 proposed verifiable multi- 
party computation, both of which can be partially or fully 
exploited later. Designing privacy preserving data aggregation 
while providing verification of the correctness of the provided 
data is a future work. 

C. Discrete Logarithm Problem 

Let G C 1i p be a cyclic multiplicative integer group, where 
p is a large prime number, and g be a generator of it. Then, for 
all h € G, h can be written as h = g k for some integer k, and 
any integers are congruent modulo p. The discrete logarithm 
problem is defined as follows: given an element h S G, find 
the integer k such that g b = h. 

The famous Decision Diffie-Hellman (DDH) problem pro- 
posed by Diffie and Hellman in JH is derived from this 
assumption. DDH problem is widely exploited in the field 
of cryptography (e.g., El Gamal encryption iTTO) and other 
cryptographic security protocols such as CP-ABE []3]) as 
discussed in J4). Our protocol is based on the assumption that 
it is computational expensive to solve the discrete logarithm 
problem as in other similar research works ( (1131 . ifPTI . lE4l ). 

IV. Achieving sum Under Secured Communication 
Channel 

Before introducing our aggregation scheme without secure 
communication channel, we first describe the basic idea of 
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randomized secure sum calculation under secured communi- 
cation channel (It can be trivially converted to secure product 
calculation). The basic idea came from Clifton et al. 0, which 
is also reviewed in (21], but we found their setting imposed 
unnecessary communication overhead, and we reduced it 
while maintaining the same security level. Assume participants 
p l7 p 2 , • • • , p n are arranged in a ring for computation purpose. 
Each participant itself breaks its privately owned data block 
Xi into k segments s^j such that the sum of all k segments 
is equal to the value of the data block. The value of each 
segment is randomly decided. For sum, we can simply assign 
random values to segments Si j (1 < j < k — 1) and let 
Si,k = Xi — YljZi s i,j- Similar method can be used for 
product. In this scheme, each participant randomly selects 
k — 1 participants and transmit each of those participants a 
distinctive segment Sij. Thus at the end of this redistribution 
each of participants holds several segments within which one 
segment belongs to itself and the rest belongs to some other 
participants. The receiving participant adds all its received 
segments and transmits its result to the next participant in 
the ring. This process is repeated until all the segments of all 
the participants are added and the sum is announced by the 
aggregator. 

Recall that there are n participants and each participant 
randomly selects k — 1 participants to distribute its segments. 
Clearly, a larger k provides better computation privacy, how- 
ever it also causes larger communication overhead which is 
not desirable. In the rest of this section, we are interested at 
finding an appropriate k in order to reduce the communication 
cost while preserving computation privacy. 

In particular, we aim at selecting the smallest k to ensure 
that each participant holds at least one segment from the other 
participants after redistribution. We can view this problem 
as placing identical and indistinguishable balls into n distin- 
guishable (numbered) bins. This problem has been extensively 
studied and well-understood and the following lemma can be 
proved by simple union bound: 

Lemma IV. 1. Let e 6 (0, 1) be a constant. If we randomly 
place (1 + e)nlnn balls into n bins, with probability at least 
1 \, all the n bins are filled. 

Assume that each participant will randomly select k — 1 
participants (including itself) for redistribution. By treating 
each round of redistribution as one trial in coupon's collector 
problem, we are able to prove that each participant only needs 
to redistribute ((1 + e)n\\\n)/n = (1 + e)lnn segments to 
other participants to ensure that every participant receives at 
least one segment with high probability. However, different 
from previous assumption, each participant will select k — 1 
participants except itself to redistribute its segments in our 
scheme. Therefore, we need one more round redistribution for 
each participant to ensure that every participant will receive at 
least one copy from other participants with high probability. 

Theorem IV.2. Let e e (0,1) be a constant. If each par- 
ticipant randomly selects (1 + e) Inn + 1 participants to 



redistribute its segments, with probability at least 1 — -jj-, 
each participant receives at least one segment from the other 
participants. 

This theorem reveals that by setting k to the order of In n, 
we are able to preserve the computation privacy. Compared 
with traditional secure sum protocol, our scheme dramatically 
reduce the communication complexity. However, we assume 
that the communication channel among participants are secure 
in above scheme. In the rest of this paper, we try to tackle the 
secure aggregation problem under unsecured channels. 

V. Efficient Protocols for Sum and Product 

In this section, we present two novel calculation protocols 
for each model which preserve individual's data privacy. 
These four protocols will serve as bases of our solution to 
privacy-preserving data aggregation problem. For simplicity, 
we assume all coefficients (k G [1, TO ]) and powers 
di,k (i G G [l> m ]) °f me polynomial /(x) = 

SfcLi c fe(n™=i x i' k ) are known to every participant p ; . Table 
U summarizes the main notations used in this paper. 



TABLE I 

Notations of symbols used in our protocols 



Pi 


i-th participant in data aggregation 


A 


Aggregator 


Gi,G 2 


multiplicative cyclic integer groups 




generators of above groups 


di,k 


power of Xi ' 


Cfe 


coefficient of c^- X]™— i x i % ' h 


n,h 


randomly chosen numbers 



A. Product Protocol - Participants Only Model 

Firstly, we assume that all participants together want to 
compute the value /(x) = ]X Xi given their privately known 
values Xi gZ p . The basic idea of our protocol is to find some 
random integers Ri 6 7L V such that \\ i R4 = 1 mod p and 
the user p^ can compute the random number Ri easily while it 
is computationally expensive for other participants to compute 
the value Ri. 

Let Gi C Z p be a cyclic multiplicative group of prime 
order p and g\ be its generator. Then our protocol for privacy 
preserving production HiXi has the following steps: Setup, 
Encrypt, Product. 

Setup -> n e Z q ,Ri = (.gp + 7sP~T e d 

We assume all participants are arranged in a ring for 
computation purpose. The ring can be formed according to 
the lexicographical order of the MAC address or even the 
geographical location. It is out of our scope to consider this 
problem. Each p i (-i G {1, • • ■ ,n}) randomly chooses a secret 
integer r.i 6 Z g using PRF and calculates a public parameter 
Si € Gi. Then, each p^ shares Yi = g\* mod p with 
and p i+1 (here p„ +1 = p x and p = p n ). 

After a round of exchanges, the participant p i computes the 
number R4 = (Y l+1 / Y^y* = (g[ i+1 j 'gf" 1 ) 7 "* mod p and 
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(a)Participants only model (b) One aggregator model 



Fig. 1. Communications in Setup 

keeps this number secret. Note p x calculates (c/[ 2 / 'g^Y 1 
and p„ calculates (<7i 1 /Si < "~ 1) ) r ' 1 - 
Encrypt(x 4 ) -> d e Gi 

When a product is needed, every p^ creates the ciphertext: 

Ci := Xi ■ Ri = Xi ■ (g[ i+1 /gl i ~ 1 ) ri mod p 

where xi is his private input data. If he does not want to 
participate in the multiplication, he can simply set Xi := 1. 
Then, he broadcasts this ciphertext. 



A 




(a)Participants only model (b) One aggregator model 



Fig. 2. Communications in Encrypt 

Produced, C 2 , ■ ■ ■ , C n }) -+ n™=i ^ e d 

Any p i? after receiving n ciphertexts {Ci,C2,- ■ ■ ,C„} 
from all of the p,'s, calculates the following product: 

n n 

~\_ Ci = x l mod p 

i=i i=i 

To make sure that we can get a correct result n™=i x i 
without modular, we can choose p to be large enough, say 
p > M n , where M is a known upper bound on xi. 

B. Product Protocol - One Aggregator Model 

We use the same group used in Participants Only Model. 
Everything is same as the protocol above, except that the 
aggregator A acts as the (n + l)-th participant p n+1 . In 
other words, there are n + 1 "participants" now. The second 
difference is that, each participant p i will send the ciphertext 
Ci to the aggregator, instead of broadcasting to all participants. 
The aggregator A will not announce its random number 
Rn+i = (.9iV3i") r " +1 to an Y regular participants. 



Each participant p^, i e [l,n], sends the ciphertext Cj = 
Ri ■ Xi to the aggregator A. The aggregator A then calculates 

n n 
i=l i=l 

to achieve the final product, where r n+ i is the random number 
generated by A. 

C. Sum Protocol - Participants Only Model 

Here we assume that all participants together want to 
compute the value /(x) = Y^i=i x i gi ven me ir privately 
known values Xi S Z p . It seems that we can still exploit 
the method used for computing product by finding random 
numbers Ri such that Y^i=x Ri = 0- ^ e f° un d that it is 
challenging to find such a number Ri while preserve privacy 
and security. The basic idea of our protocol is to convert the 
sum of numbers into production of numbers. Previous solution 
ll20l essentially applied this approach also by computing the 
product of n"=i 9 Xi = g^- ,i=lXi - Then find 5Z"=i x i ^ 
computing the discrete logarithm of the product. As discrete 
logarithm is computational expensive, we will not adopt this 
method. Instead, we propose a computational efficient method 
here. 

In a nutshell, we exploit the modular property below to 
achieve the privacy preserving sum protocol. 

(l+p) m = ^ ( . W = l + mp modp 2 (2) 

From the Equation ©, we conclude that 

n n 

]J(l+p) Xi = ]J{l+p-x. l ) = (l+p^an) modp 2 . 

i—l i—l i 

Our protocol works as follows. Let G2 C Z p 2 be a cyclic 
multiplicative group of order p(p— 1) and 92 be its generator, 
where p is a prime number. Then our protocol for privacy 
preserving summation UiXi has the following steps: Setup, 
Encrypt, Sum. 

Setup n e 1 pq ,R t - (g r 2 ' +1 IgT'Y* 

Remember that participants are arranged in a circle. p i 
uses PRF to randomly pick a secret number r; € Z pq , and 
calculates a public parameter g% . Then, he shares g^ with 
Pj , 1 and Pi_i- Similar to the product calculation protocol, p Tl 
shares his public parameter with his P(„_ 1 ) and p x , and p x 
shares his public parameter with p 2 and p n . 

After a round of exchanges, each p ; calculates Ri = 
(32* + Vff2 _1 ) ri an< ^ keeps this secret. 

Encrypt(a; i , R4) -> C t e G 2 

This algorithm crosses over two different integer groups: 
Gi and G2. Each p 2 - first calculates (1 + Xi -p) mod p 2 . Note 
that Xi G Gi, and it is temporarily treated as an element in 
G2, but this does not affect the last value of the result since 
operations in G2 are modulo p 2 . Then, he multiplies the secret 
parameter Ri = (g 2 * +1 / 'g^'^Y* to ^ to § et m e ciphertext: 

Q = (1 + Xi -p) ■ Ri 
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After all, each participant broadcasts his ciphertext to each 
others. 

Sum({Ci, C 2) • • ■ , C n }) -> Efc=i ^ e °i- 
Each participant, after receiving the ciphertexts from all of 
other participants, calculates the following C £ G2' 

n n 
i=l i=l 

Then, he calculates (C — l)/p mod p = Yl7=i Xi m °d P to 
recover the final sum. 

D. Sum Protocol - One Aggregator Model 

Similar to the product protocol for One Aggregator Model, 
everything is the same except that A acts as (n + l)-th partic- 
ipant in this model. The participants send their ciphertexts to 
A, and A calculates 

n n 

c = /g r 2 "Y n+i n ci = (1 + p j>) m ° d p 2 

i=X i=l 

Then, he can compute the final sum result Yl7=i Xi - 

VI. Efficient Protocols for General Multivariate 
Polynomial 

Now we are ready to present our efficient privacy preserving 
protocols for evaluating a multivariate polynomials. Our proto- 
col is based on the efficient protocols for sum and production 
presented in the previous section. 

A. One Aggregator Model 

The calculation of the polynomial [TJcan be divided into nm 
multiplications and m additions. In this section we show how 
to conduct a joint calculation of m products and one sum while 
preserving individual's data privacy in the One Aggregator 
Model. Different from the protocols in the Section [V] those 
broadcast ciphertexts are not broadcast this time, they are sent 
to the aggregator instead. The purpose of this small change 
is only for reducing communication complexity, and from the 
security perspective, this is just same as broadcasting since 
our communication channels are insecure. 

1) Basic Scheme: All the participants execute Setup to 
initiate the system. Then, for each k, all the participants need 
to calculate ' fc 's first, where <ij,fc's are powers specified by 
the aggregator A, and run the aforementioned product protocol 
for each k G [l,m]. If A does not need the data from some 
participant p^, A can set his powers to be 0, and if p^ does 
not want to participate in the aggregation, he can simply set 
his input as 1. 

Then, the aggregator is able to calculate 

Em I rin dik\ 
k=A c kUi=l x i )■ 

2) Advanced Scheme: The above Basic Scheme pre- 
serves data privacy in our problem as long as there are 
at least two a^*'*"s not equal 1 in each following set 
{xf 1,k , x^' k ■ ■ ■ , Xn"' k }ke{i,--- ,m}> which will be further dis- 
cussed in the Section IVII-B1I Therefore, we exploit the 
aforementioned sum protocol to achieve Secure Scheme. 



All the participants execute Setup. Then, when executing 
the Encrypt of the product protocol, each participant checks 
whether his input is the only one not equal to 1 for each 
product n"=i x i *'' (i- e -' n is d^i is the only one not equal to 
in {dx 1, d% t i, ■ ■ ■ , d n {\). If it is, the product equals to his input 
data, which will directly disclose his data, so he skips it. The 
elements that are omitted form a set D sum = {x i % ' k }fcei a „ m , 
where I sum is the set of indices fc's corresponding to the 
elements in D sum . For each xf'' k € D sum , find his owner 
p^ and add him into the set P SU m- There can be duplicate 
p/s in the set P sum - The p,'s in P sum need to calculate the 
following without knowing each other's input: 

c k x t 

p,€P s „ m 

They are called sum participants, and we assume they are 
ordered by non-decreasing order of their indices in P sum and 
arranged in a circle. In what follows, we denote p/s successor 
and predecessor in the P sum as p i suc and p i pre respectively. 
These sum participants run the sum protocol to encrypt their 
data and sends to the aggregator A. 

A, after receiving all the sum ciphertexts, is able to 
calculate 2~2kei c kX^' k ■ Then, he is able to calculate 

k=l (CkUi=l X i )■ 

B. Participants Only Model 

From the One Aggregator Model, we know the combination 
of two protocols (product protocol and second sum protocol) 
proposed in Section [V] gives the best scheme. Therefore we 
only show the scheme which employs both product and sum 
protocols. 

Advanced Scheme: Every participant executes Setup, and 
when he executes the Encrypt of the product protocol, he 
conducts the same examination as in the Section [VI- A2I above. 
Then, the sum participants run the sum protocol to share 
their sum with each other. Finally, all participants are able to 
calculate 2~2T=i ( c k ll™=i x i*' k ) based on others' ciphertexts. 

VII. Correctness, Complexity and Security 
Analysis 

Here we provide rigorous correctness proofs, complexity 
and security analysis of the protocols presented in this paper. 
We also discuss when our protocols could leak information 
about the privately known data Xi and provide methods to 
address this when possible. 

A. Correctness 

Next we show the correctness of the product protocol in 
Section [V] 

1 ) Product Protocol: We only provide the analysis for Par- 
ticipants Only model, but the correctness in One Aggregator 
model is easily derivable from it. After participants receive 
{Ci, • • ■ ,C n } they conduct the following calculation: 
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n^n^p+vsr-T) 

i=l i=l 

n n 

=(n^)n(^ i+i M i - i r < ) 

i=l i=l 
n n 

=(n^F =i(n+in " rir " i) =n^ 

i=l i=l 

Here r n+ i = ri,ro = r„. Thus, the products are correctly 
calculated. 

2) Sum Protocol: Similar to above, we only discuss the 
correctness for Participants Only Model. After participants 
receive {Ci, • ■ ■ , C„}, they conduct the following calculation: 

n n 

C = HQ = J] (1 + XipM^/g?- 1 )* 

i=l i=l 
n 

i=l 
n 

= (l+pVaij) mod p 2 

i=l 

Thus, (C — l)/p mod p is indeed equal to 2~27=i Xi m °d P* 
B. Security 

We discuss the security of the schemes in both One Aggre- 
gator Model and Participants Only Model in this section. 

1 ) Special Case of Products Calculation: As mentioned 
in the Section IVI-A21 if there is only one ciphertext di >k 
is not equal to in any set {d 1 . k ,d 2 , k ■ ■ ■ , dn,k}ke{i,-,m} 
during the products calculation, the individual data Xi can be 
disclosed to others. This is because: 

(suppose that only d^k is the only ciphertext not equal to 1 in 

the set {di,fc,c?2,fc • ■ ■ ,d n , k }) 

Decry pt({Ci, fc , C 2 , k • • • , C n>k }) = x, 

and Xi is disclosed to others if c k ^ 0. Therefore, in this 
case, the participants should conduct additional secure sum 
calculation before sending the ciphertexts to others. 

2) Randomness and Group Selection: In fact, in the product 
calculation protocol, the group G% should be carefully selected 
to make the input x% indistinguishable to a random element. 
We select a cyclic multiplicative group Gi C Z p of prime 
order q as follows. Find two large prime numbers p, q such 
that p = kq + 1 for some integer k. Then, find a generator 
h for Z p , and set g\ := modulo p (clearly gi ^ 1 
modulo p). Then group Gi is generated by g\, whose order is 
q. Here the powers of the numbers in Gi belong to an integer 
group Z q . 

Next, we show that any input data Xi is computationally 
indistinguishable to any random element chosen from Z p via 
PRE 

For any i, we have 

c < =xi( ff r« + 7fl r4 - i ) r4 =W <+i ~ ri - 0ri - 



Let Xi be g\ % and r^+i — r,_x be ji, where Xi G an d 
7i s liq (This is possible since g\ is a generator of the group 
GO. Then, Ci=gfgf r \ 

Theorem VII.l. Vxi,n e Z q , 3f t ,Xi G *Z q such that 

gXigJiri = g XigJi r * m0 d p. 

Proof: For any ri,fj G Z 9 , there exists x-t G Z g such 

that: 

li(ri - fi) = Xi - Xi mod q 

because q and (r, — r,) are relatively prime (q is a prime 
number). Then we have x% G Z g for any G Z g such that: 

5i = Si mod P => 3i ffi = 9i 9i mod p 

This implies that given the ciphertext Cj, any value a;, is a 
possible valid data that can produce this 'ciphertext' CV ■ 
According to the Theorem IVII.ll we can deduce that x% 
has the same level of randomness as r%. Therefore, g* 1 is 
indistinguishable to a random element in Gi from other 
participants' or attackers' perspective, which implies 

Theorem VII.2. The input Xi is computationally indistin- 
guishable to a random element chosen from Gi. 

3) Closure and Group Selection: We need to guarantee 
that all the multiplications in the sum protocol are closed in 
G2. Since (1 +X{p) ■ {g 2 +t I g2~ 1 Y i i s me on ly multiplication 
throughout the sum protocol, we must carefully choose the 
group G2 such that 1+XiP G G2. We let G2 C Z p 2 be a cyclic 
multiplicative group generated by h, which is the generator 
of Z p . Then, the order of G2 is p(p — 1), and the powers 
of the numbers in G2 belong to an integer group Z p ( p _ 1 ). 
Since G2 = Z p 2 — {x\x = k ■ p,for some integer k] and 
Vfc : 1 + x,ip ^ kp, 1 + Xip belongs to the group G2. 

4) Restriction of the Product and Sum Protocol: In both 
protocols, we require that number of participants is at least 3 
in Participants Only Model and at least 2 in One Aggregator 
Model. In Participants Only Model, if there are only 2 partic- 
ipants, privacy is not preservable since it is impossible to let 

know x\ + X2 or X1X2 without knowing x 2 . However, in 
One Aggregator Model, since only the aggregator A knows 
the final result, as long as there are two participants, A is not 
able to infer any individual's input data. 

C. Complexity 

We discuss the computation and communication complexity 
of the Advanced Scheme for each model in this section. 

1) One Aggregator Model: It is easy to see that the 
computation complexities of Setup, Encrypt and Product of 
the product protocol are 0(1), 0(1) and 0(n) respectively. 
Also, Encrypt is executed for m times by each participant 
and Product is executed for m times by the aggregator in the 
Advanced Scheme. 

Every participant and the aggregator exchanges g Ti 's with 
each adjacent neighbor in the ring, which incurs communi- 
cation of 0(|p|) bits in Setup, where \p\ represents the bit 



g 



length of p. In Encrypt, each participant sends m cipher- 
texts Cfc n"=i x i*' k ' s to tne a gg re g a tor, so the communication 
overhead of Encrypt is 0(m\p\) bits. Since n participants 
are sending the ciphertexts to the aggregator, the aggregator's 
communication overhead is 0{mn\p\). 

Similarly, the computation complexities of Setup, Encrypt 
and Sum in the sum protocol are 0(1), 0(1) and 0(m) 
respectively, and they are executed for only once in the 
scheme. Hence, the communication overhead of Setup, En- 
crypt and Sum are 0(|p 2 |) bits, 0(|p 2 |) bits and 0(m\p 2 \) 
bits respectively (\p 2 \ is the big length of p 2 ). 

Note that \p 2 \ = 2\p\. Then, the total complexity of 
aggregator and participants are as follows: 

TABLE II 
One Aggregator Model 



Aggregator 


Computation 


Communication (bits) 


Product (Product) 


0(mn) 


0(mn\p\) 


Sum (sum) 


0(m) 


0(m\p\) 


Per Participant 


Computation 


Communication (bits) 


Setup (Product) 


O(l) 


O(lPl) 


Encrypt (Product) 


0{m) 


0(m\p\) 


Setup (sum) 


O(l) 


o(\p\) 


Encrypt (sum) 


O(l) 


o(\p\) 



2) Participants Only Model: In the Participants Only 
Model, participants broadcast ciphertexts to others, and cal- 
culates the products and sums themselves, therefore the com- 
plexities are shown as below: 



TABLE III 
Participants Only Model 



Per Participant 


Computation 


Communication (bits) 


Setup (Product) 


O(l) 


0(|p|) 


Encrypt (Product) 


O(m) 


0(mn\p\) 


Product (Product) 


0(mn) 


0(mn\p\) 


Setup (sum) 


Oil) 


o(\ P \) 


Encrypt (sum) 


O(l) 


0(m\p\) 


Sum (sum) 


0(m) 


0(m\p\) 



Note that the communication overhead is balanced in the 
Participants Only Model, but the system-wide communication 
overhead is increased a lot. In the One Aggregator Model, the 
system-wide communication overhead is: 

0(mn\p\) + 0(m\p\) + n ■ 0(|p|) = 0(mn\p\) (bits) 

However, in the Participants Only Model, the system-wide 
communication complexity is: 

n ■ 0(\p\) + n ■ 0(m\p\) + n ■ 0(mn\p\) = 0(mn 2 \p\) (bits) 

VIII. Performance Evaluation by Implementation 

We conduct extensive evaluations of our protocols. Our 
simulation result shows that the computation complexity of 
our protocol is indeed linear to the number of participants. 
To simulate and measure the computation overhead, we used 
GMP library to implement large number operations in our 
protocol in a computer with Intel i7-2620M @ 2.70GHz 
CPU and 2GB of RAM, and each result is the average time 



measured in the 100,000 times of executions. Also, the input 
data Xi is of 20-bit length, the q is of 256-bit length, and p 
is roughly of 270-bit length. That is, Xi is a number from 
[0, 2 20 — 1] and q is a uniform random number chosen from 
[0,2 256 -l]. 

In this simulation, we measured the total overhead of our 
novel product protocol and sum protocol (the second sum 
protocol) proposed in the Section |V). Here, we measured the 
total computation time spent in calculating the final result 
of n data (including encryption by n participants and the 
decryption by the aggregator). Since we only measure the 
computation overhead, there is no difference between One 
Aggregator Model and Participants Only Model. 



| — »— Our Product 



(a) product (b) sum 

Fig. 3. Running time for product and sum calculation. 

First of all, the computation overhead of each protocol is 
indeed proportional to the number of participants. Also, the 
sum protocol needs much more time. This is natural because 
parameters in the sum protocol are in Z p 2 , which are twice of 
the parameters in the product protocol in big length (they are 
in Z p ). 

Multivariate polynomial evaluation is composed of m prod- 
ucts and one sum, so its computation overhead is barely the 
combination of the above two protocols' overhead. 

We further compare the performance of our protocol with 
other existing multi party computation system implemented 
by Ben et al. Ill (FairplayMP). They implemented the BMR 
protocol [1|, which requires constant number of communica- 
tion rounds regardless of the function being computed. Their 
system provides a platform for general secure multi-party 
computation (SMC), where one can program their secure com- 
putation with Secure Function Definition Language (SFDL). 
The programs wrote in SFDL enable multiple parties to jointly 
evaluate an arbitrary sized boolean circuit. This boolean circuit 
is same as the garbled circuit proposed by Yao's 2 Party 
Computation (2PC) fl22ll23l . 

In Ben's setting, where they used a grid of computers, each 
with two Intel Xeon 3GHz CPU and 4GB of RAM, they 
achieved the computation time as following tables when they 
have 5 participants: 

TABLE IV 

Run time (milliseconds) of FairplayMPI 2| 



Gates 


32 


64 


128 


256 


512 


1024 


Per Participant 


64 


130 


234 


440 


770 


1394 



One addition of two fc-bit numbers can be expressed with 
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k + 1 XOR gates and k AND gates. Therefore, if we set 
the length of input data as 20 bits (which is approximately 
1 million), we need 41 gates per addition in FairplayMP 
system. When we conduct 26 additions (which is equivalent 
to 1066 gates) in our system, the total computation time 
is 72.2 microseconds, which is 2 x 10 4 times faster than 
the FairplayMP, which needs 1.394 seconds to evaluate a 
boolean circuit of 1024 gates. Even if we did not consider 
the aggregator's computation time in FairplayMP because they 
did not provide pure computation time (they provided the total 
run time including communication delay for the aggregator), 
our addition is already faster than their system. Obviously, the 
multiplication is much faster since it is roughly 8 times faster 
than the addition in our system. 

We also compare our system with an efficient homomorphic 
encryption implementation IfTBI . Lauter et al. proposed an 
efficient homomorphic encryption scheme which limits the 
total number of multiplications to a small number less than 
100. If only one multiplication is allowed in their scheme 
(the fastest setting) and length of the modulus q is 1024, it 
takes 1 millisecond to conduct an addition and 41 milliseconds 
to conduct a multiplication. In our system, under the same 
condition, it takes 16.2 microseconds to conduct an addition 
and 0.7 microseconds to conduct a multiplication, which are 
approximately 100 times and 6 x 10 4 times faster respectively. 
They implemented the system in a computer with two Intel 
2.1GHz CPU and 2GB of RAM. Even if considering our 
computer has a higher clock CPU, their scheme is still much 
slower than ours. 



TABLE V 

Comparison between fl6l and our system 





Addition 


Multiplication 


Lauter |16 | 


1 millisecond 


41 milliseconds 


Ours 


16.2 microseconds 


0.7 microseconds 



The purpose of above two systems are quite different 
from ours, the first FairplayMP is for general multi-party 
computation and the second homomorphic encryption system 
is for general homomorphic encryption. They also provide a 
much higher level of security than ours since they achieve 
differential privacy, however, the comparison above does show 
the high speed of our system while our security level is still 
acceptable in real life applications, and this is one of the main 
contributions of this paper. 

IX. Conclusion 

In this paper, we successfully achieve a privacy-preserving 
multivariate polynomial evaluation without secure communi- 
cation channels by introducing our novel secure product and 
sum calculation protocol. We also show in the discussion 
that our proposed construction is efficient and secure enough 
to be applicable in real life. However, our scheme discloses 
each product part in the polynomial, which gives unnecessary 
information to attackers. Therefore, our next research will be 
minimizing the information leakage during the computation 



and communication. Another future work is to design privacy 
preserving data releasing protocols such that certain functions 
can be evaluated correctly while certain functional privacy can 
be protected. 
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